PatentCom: A Comparative View of Patent Document Retrieval
نویسندگان
چکیده
Patent document retrieval, as a recall-orientated search task, does not allow missing relevant patent documents due to the great commercial value of patents and significant costs of processing a patent application or patent infringement case. Thus, it is important to retrieve all possible relevant documents rather than only a small subset of patents from the top ranked results. However, patents are often lengthy and rich in technical terms, and it often requires enormous human efforts to compare a given document with retrieved results. In this paper, we formulate the problem of comparing patent documents as a comparative summarization problem, and explore automatic strategies that generate comparative summaries to assist patent analysts in quickly reviewing any given patent document pairs. To this end, we present a novel approach, named PatentCom, which first extracts discriminative terms from each patent document, and then connects the dots on a term co-occurrence graph. In this way, we are able to comprehensively extract the gists of the two patent documents being compared, and meanwhile highlight their relationship in terms of commonalities and differences. Extensive quantitative analysis and case studies on real world patent documents demonstrate the effectiveness of our proposed approach.
منابع مشابه
Revisiting Document Length Hypotheses: NTCIR-4 CLIR and Patent Experiments at Patolis
NTCIR-4 experiments of CLIR J-J and Patent tasks, focusing on comparative studies of two testcollections and two retrieval approaches in view of document length hypotheses are described. TF*IDF outperformed the language modeling approach in the CLIR J-J task while two approaches performed similarly in the Patent task. Two different document length hypotheses behind two tasks/collections are ass...
متن کاملExploring Structured Documents and Query Formulation Techniques for Patent Retrieval
This paper presents the experiments and results of DCU in CLEF-IP 2009. Our work applied standard information retrieval (IR) techniques to patent search. Different experiments tested various methods for the patent retrieval, including query formulation, structured index, weighted fields, document filtering, and blind relevance feedback. Some methods did not show expected good retrieval effectiv...
متن کاملDocument Image Retrieval Based on Keyword Spotting Using Relevance Feedback
Keyword Spotting is a well-known method in document image retrieval. In this method, Search in document images is based on query word image. In this Paper, an approach for document image retrieval based on keyword spotting has been proposed. In proposed method, a framework using relevance feedback is presented. Relevance feedback, an interactive and efficient method is used in this paper to imp...
متن کاملPOSTECH at NTCIR-5 Patent Retrieval: Smoothing Experiments in a Language Modeling Approach to Patent Retrieval
This report describes the experimental results of our participation at the Document Retrieval Subtask of NTCIR-5 Patent Retrieval Task. Unlike newspaper articles which belong to the main document type handled in previous information retrieval experiments, patent documents have many different characteristics in terms of length, technicality, structureness, etc. Among these, we focus on the lengt...
متن کاملDocument image classification, with a specific view on applications of patent images
The main focus of this paper is document image classification and retrieval, where we analyze and compare different parameters for the RunLeght Histogram (RL) and Fisher Vector (FV) based image representations. We do an exhaustive experimental study using different document image datasets, including the MARG benchmarks, two datasets built on customer data and the images from the Patent Image Cl...
متن کامل